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ABSTRACT 


Analytical  chemists  have  long  been  concerned  with  obtaining  optimal 
experimental  conditions.  Robust  estimation  provides  an  additional  method  of 
Increasing  the  efficiency  of  an  analytical  technique.  This  Is  Illustrated 
for  the  determination  of  the  "true"  value,  p,  of  a  quantity  which  Is 
measured  with  error.  The  least  squares  estimator  of  p  is  compared  with  the 
median  and  Huber  estimates  over  a  variety  of  error  distributions  In  the 
vicinity  of  the  Gaussian  distribution.  Simulation  allows  examination  of  the 
efficiency  of  an  estimation  procedure  as  a  function  of  the  error 
distribution.  Results  are  presented  which  show  the  least  squares  estimator 
of  p  to  be  much  more  sensitive  to  a  non-Gausslan  error  distribution  than 
generally  realized  In  the  chemical  community.  Additionally,  the  arguments 
commonly  used  to  support  least  squares  estimation  are  critically  examined. 


!  L','  I  ;  .  I 


i.'] 


fl'i\ 


L  ± 


WTO 


INTRODUCTION 


Experimental  optimization  has  been  an  Important  subject  In  analytical 
chemistry  for  many  years  now.  This  term  often,  though  not  always,  suggests 
a  technique  for  Increasing  the  precision  of  analytical  measurements  (e.g. 
Increased  sensitivity,  improved  reliability,  or  decreased  cost).  Examples 
of  optimization  In  chemistry  range  from  the  development  of  self-optimizing 
Instruments(l)  to  the  use  of  expert  systems  In  methods  development (2). 

The  efficiency  of  an  analytical  technique  depends  on  more  than  just  the 
precision  of  the  measurement  process.  Eckschlager  and  Stepanek(3)  have 
characterized  an  analytical  system  as  two  relatively  Independent  subsystems. 
In  the  first  of  these  two  subsystems,  an  analytical  apparatus  extracts 
Information  from  a  sample  and  encodes  It  In  an  analytical  signal  (e.g. 
voltage);  In  the  second,  this  signal  Is  decoded  to  yield  Information.  The 
Information  gained  from  a  chemical  analysis  depends  on  the  efficiency  of  the 
overall  system,  and  can  be  limited  by  either  of  the  two  subsystems.  Most  of 
the  optimization  done  In  analytical  chemistry  has  been  concerned  with  the 
first  subsystem. 

The  problem  of  decoding  analytical  signals  lies  within  the  realm  of 
chemometrlcs,  which  has  been  defined  as  the  discipline  of  using  mathematical 
and  statistical  techniques  to  extract  Information  from  measurements ( 4) . 
Chemists  often  associate  chemometrlcs  with  sophisticated  multidimensional 
techniques,  expert  systems,  or  artificial  Intelligence.  In  spite  of  very 
elegant  work  in  these  areas,  the  vast  majority  of  chemometrlc  techniques 
actually  used  In  chemical  laboratories  are  simple  univariate  statistics, 
such  as  least  squares  estimates  of  the  mean,  standard  deviation,  or 
regression  coefficients.  These  statistics  are  usually  justified  In 
analytical  texts  by  the  assumption  of  Gaussian,  or  normal,  errors. 


2 


The  Importance  of  the  normal  error  distribution  to  least  squares 
techniques,  along  with  the  consequences  of  departures  from  this  assumption, 
has  received  much  attention  from  statisticians;  however,  most  chemists  seem 
to  be  largely  unaware  of  its  Importance.  Ames  and  Szony1(5)  and  F1111ben(6) 
have  warned  of  the  possibility  of  drawing  Incorrect  conclusions  when  the 
normality  assumption  Is  violated,  and  have  proposed  the  testing  of  error 
distributions.  Tests  for  normality  require  many  more  observations  than  are 
generally  available  In  chemical  experiments.  Even  when  an  adequate  number 
of  data  points  Is  available.  It  Is  most  unusual  for  a  chemist  to  apply  any 
normality  test.  Studies  In  enzyme  kinetics  have  both  supported (7, 8)  and 
contradicted (9-11)  the  assumption  of  normal  error  distributions  In  chemical 
data.  In  a  particularly  Impressive  study,  Clancy(12)  has  examined  250  error 
distributions  based  on  50,000  chemical  analyses  and  found  less  than  15*  of 
the  distributions  can  be  considered  normal  for  the  purpose  of  applying 
coimnon  statistical  techniques. 

Many  statistics  books  for  the  research  worker  deal  exclusively  with 
least  squares  methods,  and  only  Invoke  the  assumption  of  Independent, 
normally  distributed  errors  for  the  validity  of  confidence  Intervals  and 
statistical  tests  calculated  using  least  squares  results.  Thus  It  Is  not 
surprising  that  many  chemists  believe  least  squares  estimates  are  the 
optimum  statistics  whatever  the  error  distribution.  The  efficiency  of  these 
estimates  rapidly  decreases  under  mild  departures  from  normality,  as  has 
been  demonstrated  by  several  recent  studies  and  Is  discussed  In  further 


detail  below.  In  terminology  familiar  to  the  analytical  chemist,  nonnormal 
errors  can  lead  to  poor  precision  in  least  squares  parameter  estimates  and 


Much  work  Is  currently  underway  In  statistics  In  the  development  of 


robust  estimation,  as  illustrated  by  references  13-15.  A  statistic  Is 
called  robust  If  It  Is  Insensitive  to  mild  departures  from  the  underlying 
assumptions  and  Is  only  slightly  Inefficient  relative  to  least  squares  when 
these  assumptions  are  true.  This  inefficiency  under  Ideal  circumstances  Is 
often  referred  to  as  the  premium  paid  for  protection  under  nonideal 
conditions.  Additionally,  robust  methods  are  also  resistant  to  the  presence 
of  any  outliers  In  the  data.  Unlike  statisticians,  chemists  have  paid  only 
passing  attention  to  these  developments.  Isenburg(16)  has  proposed  the 
method  of  moments  as  an  alternative  to  least  squares  iterative  reconvolution 
In  the  analysis  of  pulse  fluorometric  data.  Phillips  and  Eyr1ng(17)  and 
Massart  et  a1.(18)  have  compared  the  performance  of  least  squares  regression 
and  robust  regression,  concluding  that  robust  regression  often  outperforms 
least  squares  regression  In  the  analysis  of  chemical  data.  The  main 
emphasis  behind  these  articles  has  been  the  Insensitivity  of  robust 
estimation  to  a  small  number  of  errors  In  the  data. 

The  present  paper  Is  concerned  with  robust  estimation  as  a  method  of 
increasing  the  efficiency  of  an  analytical  technique.  This  can  best  be 
illustrated  by  the  estimation  of  the  "true"  value,  p,  of  a  quantity  which  is 
measured  with  error.  For  example,  this  may  be  the  concentration  of  Pb  In 
drinking  water.  The  least  squares  estimator  of  p  Is  compared  with  robust 
estimates  over  a  variety  of  realistic  error  distributions  In  chemistry. 
Simulation  allows  examination  of  the  efficiency  of  an  estimation  procedure 
as  a  function  of  the  error  distribution.  Addltlonaly,  the  arguments 
commonly  used  to  support  least  squares  estimation  are  critically  examined. 


EXPERIMENTAL 


I 

|  Robust  estimation  The  least  squares  estimate  of  p  Is  the  arithmetic 

i 

mean.  This  Is  often  denoted  by  x  and  referred  to  simply  as  the  mean.  A 
robust  estimate  of  p  can  be  obtained  from  the  weighted  mean  of  the 
observations  using  the  Huber  weight  function.  This  Is  not  the  only  method 
|  of  robust  estimation,  nor  necessarily  the  best,  but  will  serve  to 

Illustrate  the  potential  advantages  of  robust  estimation.  This  approach  is 
also  conceptually  simple  and  easy  to  Implement. 
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Huber's  weight  function  Is  defined  by 
1  I  r |  <  kS 

w,  »  (2) 

1  (kS)/|r|  |r|  >  kS 

where  r  Is  the  residual  (l.e.,  difference  between  observed  and  predicted 
responses),  k  (the  tuning  constant)  determines  how  harshly  large  residuals 
are  treated,  and  S  Is  an  estimate  of  the  standard  deviation.  The 
evaluation  of  weights  requires  an  estimate  of  p.  The  initial  estimate  used 
In  the  present  work  Is  the  median. 

The  most  common  measure  of  standard  deviation  is  the  root  mean  square 
of  the  residuals.  This  Is  the  optimal  estimator  for  a  normal  error 
distribution,  but  rapidly  loses  Its  advantages  over  other  estimators  under 
even  slight  deviations  from  normal 1ty( 19).  Additionally,  a  single  large 
residual  can  drastically  change  the  value  of  the  estimator.  The  measure  of 


standard  deviation  used  in  the  present  work  Is  the  normalized  median  of  the 
absolute  deviations: 

S  =  1.48  •  medlandr^l)  (3) 

Figure  1  shows  a  graph  of  Huber's  weight  with  k=1.5  as  a  function  of 
the  residual  normalized  by  the  standard  deviation.  Observations  within  1.5 
standard  deviations  of  the  predicted  value  receive  full  weight.  (For  a 
normal  error  distribution,  87%  of  the  errors  fall  in  this  region(20).) 
Observations  outside  this  range  receive  smaller  weights  as  they  become  less 
consistent  with  the  remaining  observations.  The  choice  of  a  value  for  k  Is 
a  compromise  between  two  opposing  tendencies:  smaller  values  of  k  are  more 
efficient  for  non-Gausslan  errors,  but  less  efficient  when  the  errors  are 
actually  from  a  Gaussian  distribution  14). 

Simulated  Data  Four  hundred  different  error  distributions  were 
simulated  on  a  VAX  8300  computer.  Each  distribution  Is  a  combination  of 
two  Gaussian  distributions: 

e  =  (1-a)  N ( 0 , 1 )  +  o  C  N(0, 1)  (4) 

where  a  Is  the  probability  of  contamination,  C  Is  the  degree  of 
contamination,  and  N(0,1)  denotes  the  standard  normal  error  distribution. 
These  error  distributions,  referred  to  as  contaminated  normals,  are  a 
mixture  of  observations  from  a  normal  error  distribution  with  o  =  1  with 
probability  1-a  and  from  an  error  distribution  with  a  -  C  with  a 
probability  of  a.  Values  of  C  less  than,  equal  to,  and  greater  than  one 
correspond  to  e^ror  distributions  narrower  than.  Identical  to,  and  wider 


than  the  standard  normal  (l.e.  Gaussian)  distribution.  This  work  used 
values  In  the  range  0  <  a  <  0.20  and  0  <  C  <  6.  Figure  2  presents  the 
Ideal  error  distribution  and  the  most  extreme  distribution  used.  The  use 
of  the  standard  normal  as  a  reference  Is  completely  general  and  does  not 
affect  the  conclusions  reached. 

Gaussian  errors  were  generated  by  combining  the  methods  of  Wlchmann 
and  Hill (21)  and  Beasly  and  Springer(22) .  Three  simple  multiplicative 
congruentlal  generators  produce  numbers  uniformly  distributed  between  0  and 
1.  These  random  numbers  are  transformed  into  normal  random  deviates  by  the 
method  of  Beasly  and  Springer.  Both  algorithms  are  written  in  FORTRAN,  and 
are  machine- independent.  A  histogram  of  1000  simulated  errors  is  shown  in 
Figure  3,  along  with  the  theoretical  distribution.  Agreement  between  the 
two  Is  excellent. 

RESULTS 

This  paper  considers  three  statistics,  each  of  which  is  a  valid 
estimator  of  p.  However,  each  statistic  is  not  equally  effective  in 
extracting  the  Information  encoded  In  analytical  signals.  Each  estimator 
Is  a  function  of  several  random  variables,  and  Is  therefore  a  random 
variable  Itself.  By  repeatedly  simulating  sets  of  "experimental 
measurements".  It  Is  possible  to  generate  the  distribution  of  the  estimates 
themselves. 

For  each  error  distribution,  5000  simulated  data  sets  (each  containing 
10  observations)  were  analyzed  by  the  arithmetic  mean,  median,  and  H15 
estimators.  (HI5  Is  shorthand  notation  for  the  weighted  mean  using  Huber 
weights  with  k  =  1.5.)  The  variance  of  each  procedure  was  evaluated  for 


each  error  distribution  (i.e.  each  combination  of  a  and  C).  For  example, 
the  variance  of  the  arithmetic  mean  Is  given  by 

Var(mean;a,C)  =  Zj000(xi  -  p)2  /  5000  (5) 

The  efficiency  of  the  Huber  and  median  estimators  are  defined  relative  to 
the  arithmetic  mean  by 

Eff (H15;a,C)  =  Var(mean;a,C)/Var(H15;a,C)  (6) 

Eff (median;a,C)  =  Var(mean;a,C)/Var(median;a,C)  (7) 

The  relative  efficiencies  of  the  H15  and  median  estimators  are  shown  In 
Figures  4  and  5,  respectively.  The  Increase  In  precision  is  particluarly 
dramatic  when  the  narrow  range  of  distributions  studied  around  exact 
Gaussian  errors  (see  Figure  2)  Is  considered.  Each  error  distribution 
studied  was  "close"  to  Gaussian  and  symmetric.  Intoductlon  of  asyrnnetry 
would  have  further  deteriorated  the  precision  of  the  mean(14). 

The  relative  efficiency  measures  the  precision  of  an  estimator,  such 
as  the  Huber  or  median,  relative  to  the  mean  for  the  same  number  of 
observations.  For  an  ideal  Gaussian  error  distribution,  the  relative 
efficiencies  of  the  H15  and  median  estimators  would  be  j\95  and  .67, 
respectively.  Under  the  most  exterme  conditions  studied  in  this  work,  the 
relative  efficiency  of  the  H15  and  median  were  3.25  and  2.73.  Thus,  the 
variance  of  the  estimated  value  of  p  using  the  arithmetic  mean  is  3.25 
times  that  of  the  Huber  estimator,  on  the  average. 


Figure  6  shows  a  contour  plot  of  the  relative  efficiency  of  the  H15 
estimator  as  a  function  of  the  probability  of  contamination  and  degree  of 
contamination.  Dashed  contours  denote  regions  where  the  arithmetic  mean  Is 
more  precise,  while  solid  lines  denote  regions  where  the  H15  estimator  is 
more  precise.  In  view  of  the  greatly  enhanced  precision  of  robust 
estimation  under  slight  deviations  from  normality,  the  small  premium  under 
Ideal  conditions  appears  quite  worth  the  improved  efficiency  of  robust 
estimation  under  nonideal  conditions. 

DISCUSSION 

The  prevalent  attitude  among  chemists  seems  to  be  that  rejection  of 
erratic  data  points  provides  sufficient  protection  against  nonnormal  error 
distributions  and  justifies  the  automatic  use  of  least  squares  procedures. 
The  reasons  given  in  support  of  least  squares  estimators  deserve 
examination.  Least  squares  statistics  are  easy  to  compute;  in  fact,  this 
was  one  reason  for  the  historical  acceptance  of  least  squares.  However, 
with  the  proliferation  of  laboratory  microcomputers,  or  even  pocket 
calculators,  ease  of  computation  is  no  longer  of  primary  importance. 

A  second  reason  for  the  widespread  belief  in  least  squares  is  a  result 
of  a  mlsintrepretation  of  the  Gauss-Markov  theorem(23).  This  theorem 
states  that  the  best  linear  unbiased  estimate  of  p  is  the  sample  mean, 
whatever  the  error  distribution.  This  is  frequently  intrepreted  by 
nonstatisticians  to  mean  that  the  sample  mean  Is  the  best  of  all 
estimators.  The  important  words  In  the  Gauss-Markov  theorem  are  linear  and 
unbiased.  A  linear  estimator  is  one  which  is  a  linear  combination  of  the 
observed  values.  However,  their  is  no  inherent  reason  to  require 


linearity.  As  has  been  shown,  insistence  on  linearity  can  result  In  a  loss 
of  precision. 

Since  least  squares  is  the  optimum  estimation  procedure  for  normally 
distributed  errors,  a  third  argument  is  that  it  should  be  almost  optimum 
when  the  errors  are  approximately  normal.  The  Central  Limit  Theorem  states 
that  the  sum  of  a  "large"  number  of  independent  random  variables  (l.e., 
errors)  is  approximately  normal  regardless  of  the  distribution  of  the 
individual  random  variables(20) .  Experimental  errors  are  the  sum  of  many 
small  independent  errors.  However,  these  small  errors  often  have  widely 
different  variances  and  the  "approximately"  normal  distribution  of  their 
sum  is  closer  to  a  long-tailed  distribution.  Studies  over  the  past  15 
years  have  shown  the  arithmetic  mean  to  be  significantly  less  efficient  in 
these  situations.  The  error  distributions  used  in  this  work  have  only 
slightly  longer  tails  than  the  normal  distribution,  yet  clearly  demonstrate 
the  loss  of  precision  in  the  arithmetic  mean. 

Finally,  it  is  interesting  to  compare  the  present  relationship  between 
the  arithmetic  mean  and  the  normal  error  distribution  with  the  historical 
relationship.  Gauss(23)  introduced  the  normal,  or  Gaussian,  error 
distribution  in  1821.  He  argued  that  It  was  impossible  to  determine  the 
most  probable  value  of  an  unknown  quantity  unless  its  error  distribution 
was  known.  Without  such  knowledge,  the  only  recourse  was  to  assume  a 
distribution  In  a  "hypothetical"  fashion.  Gauss  preferred  to  take  the 
opposite  approach  and  to  look  for  that  distribution  which  would  make  the 
arithmetic  mean  the  best  estimator.  Thus,  the  arithmetic  mean  was  used  to 
justify  the  normal  error  distribution. 

The  method  of  least  squares  has  proven  very  useful  for  many  years. 

This  procedure  is  often  motivated  as  being  the  maximum  likelihood  estimator 


for  a  Gaussian  error  distribution.  Methods  for  robust  estimation  do  not 
represent  an  abandonment  of  traditional  data  reduction  procedures. 
Estimation  using  robust  weights  Is  attractive  since  It  represents  the 
maximum  likelihood  estimator  over  a  range  of  distributions  In  the 
''vicinity"  of  Gaussian.  Thus,  the  attractive  features  of  the  Huber 
estimator  do  not  depend  on  the  existence  of  an  Idealized  error 
distribution. 

CONCLUSION 

Techniques  based  on  the  principle  of  least  squares  are  the  optimal 
estimation  procedures  for  the  analysis  of  data  possessing  a  normal  error 
distribution,  but  perform  very  poorly  in  situations  Involving  a  nonnormal 
error  distribution  (see,  e.g.,  reference  14).  Almost  every  aspect  of  the 
measurement  process  has  been  examined  during  optimization  procedures. 
However  the  validity  of  the  assumption  of  normal  errors  has  received  little 
attention  from  chemists.  The  present  work  has  demonstrated  that  even  small 
deviations  from  normality  can  seriously  degrade  the  efficiency  of  least 
squares  estimators.  Only  symmetric  error  distributions  have  been  examined 
here  (more  serious  problems  arise  when  the  error  distribution  becomes 
asymmetric.)  The  deviations  are  so  small  as  to  frequently  occur  in 
practice.  The  effect  of  this  can  be  to  decrease  the  precision  of  an 
analytical  method  or  Instrument  which  has  been  carefully  optimized. 

Robust  estimation  Is  a  complementary  technique  which  is  relatively 
efficient  over  a  broad  range  of  error  distributions.  This  approach  takes 
advantage  of  the  "a  priori"  knowledge  that  errors  In  chemistry  lie  within  a 
range  of  distributions,  while  avoiding  the  inefficiency  which  results  from 
rigid  assumptions  about  the  error  distribution.  These  procedures  more 


closely  reflect  real  situations,  recognizing  that  even  in  careful  work  the 
distribution  of  errors  is  not  always  ideal.  Robust  procedures  do  not 
change  the  focus  of  data  analysis,  rather  they  are  an  efficient  alternate 
method  of  accomplishing  traditional  goals.  The  exact  robust  procedure  used 
is  not  as  Important  as  the  use  of  some  robust  method.  This  can  be  a  newer 
robust  approach,  such  as  the  Huber  weight  function,  or  a  more  traditional 
method  of  examining  the  validity  of  least  sqaures. 

Robust  methods  should  not  be  regarded  as  a  completely  automatic 
procedure  or  a  substitute  for  a  reasonable  amount  of  statistical  knowledge, 
however.  Measurements  which  have  been  assigned  small  robust  weights  have 
been  marked  for  special  attention,  including  examination  of  the 
appropriateness  of  the  error  model  as  well  as  the  possibility  of  erroneous 
data  points. 

It  is  not  the  contention  of  this  paper  that  improved  statistical 
techniques,  such  as  robust  estimation,  are  a  substitute  for  good  analytical 
data.  No  statistical  technique  can  extract  high  quality  results  from  low 
quality  data.  If  the  measurement  process  is  not  in  control,  an  analyst 
will  benefit  most  by  restoring  the  experimental  conditions  to  their  optimum 
values.  Conversely,  when  a  measurement  process  is  in  control,  analytical 
precision  can  be  limited  by  application  of  inefficient  statistical 
procedures.  Robust  estimation  is  one  method  of  detecting  incorrect 
statistical  models  and/or  error  distributions.  It  has  the  advantages  of 
being  easily  implemented  and  understood. 
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Figure  Captions 

Figure  1.  Plot  of  the  Huber  weight  function  with  a  tuning  constant  equal  to 
1.5.  The  dashed  line  Is  the  probability  density  for  the  Gaussian 
error  function. 

Figure  2.  A  plot  of  Gaussian  error  distribution  ( _ )  and  a  contaminated 

distribution  with  a  =  0.20  and  C  -  6  ( _ ). 

Figure  3.  Histogram  of  1000  simulated  errors.  Superimposed  is  the 
theoretical  distribution  for  normal,  or  Gaussian,  errors. 

Figure  4.  The  relative  efficiency  of  robust  estimation  using  the  Huber 
weight  function  with  k=l . 5  as  function  of  the  probability  of 
contamination,  c,  and  the  degree  of  contmalnatlon,  C. 

Figure  5.  The  relative  efficiency  of  the  median  estimator  as  function  of 
the  probability  of  contamination,  c,  and  the  degree  of 
contmalnatlon,  C. 

Figure  6.  A  contour  plot  of  the  relative  efficiency  of  the  H15  estimator  on 
contaminated  normals.  Dashed  lines  correspond  to  efficiencies 
less  than  one;  solid  lines  correspond  to  efficiencies  greater 
than  one. 
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